## [1] "Loading the following libraries using lb_myRequiredPackages: data.table"
## [2] "Loading the following libraries using lb_myRequiredPackages: lubridate"
## [3] "Loading the following libraries using lb_myRequiredPackages: ggplot2"
## [4] "Loading the following libraries using lb_myRequiredPackages: readr"
## [5] "Loading the following libraries using lb_myRequiredPackages: plotly"
## [6] "Loading the following libraries using lb_myRequiredPackages: knitr"
To extract and visualise tweets and re-tweets of #dockercon for 17 - 21 April, 2017 (DockerCon17).
Borrowing extensively from http://thinktostart.com/twitter-authentification-with-r/
Data should have been already downloaded using collectData.R. This produces a data table with the following variables (after some processing):
## [1] "text" "favorited" "favoriteCount"
## [4] "replyToSN" "created" "truncated"
## [7] "replyToSID" "id" "replyToUID"
## [10] "statusSource" "screenName" "retweetCount"
## [13] "isRetweet" "retweeted" "longitude"
## [16] "latitude" "location" "language"
## [19] "profileImageURL" "createdLocal" "obsDateTimeMins"
## [22] "obsDateTimeHours" "obsDateTime5m" "obsDateTime10m"
## [25] "obsDateTime15m" "obsDate" "isRetweetLab"
The table has 7,485 tweets (and 9,890 re-tweets) from 5,690 tweeters between 2017-04-16 19:01:03 and 2017-04-19 22:57:43 (Central District Time).
All (re)tweets containing #dockercon 2017-04-17 to 2017-04-20
All (re)tweets containing #dockercon Monday 17th April 2017
All (re)tweets containing #dockercon Tuesday 18th April 2017
All (re)tweets containing #dockercon Wednesday 19th April 2017
We wanted to make a nice map but sadly we see that most tweets have no lat/long set.
| latitude | longitude | nTweets |
|---|---|---|
| NA | NA | 17325 |
| 30.26416397 | -97.73961067 | 2 |
| 30.26857 | -97.73617 | 1 |
| 30.2625 | -97.7401 | 28 |
| 30.26470908 | -97.7417368 | 1 |
| 30.20226566 | -97.66722505 | 1 |
| 42.36488267 | -71.02168356 | 1 |
| 37.61697678 | -122.38427689 | 1 |
| 30.2672 | -97.7639 | 3 |
| 30.2635554 | -97.7399303 | 1 |
| 30.2591 | -97.7384 | 1 |
| 30.26622515 | -97.74327721 | 1 |
| 30.26037 | -97.73848 | 2 |
| 30.258201 | -97.71264 | 1 |
| 30.25888 | -97.73841 | 2 |
| 30.259714 | -97.73940054 | 1 |
| 30.26006 | -97.73813 | 1 |
| 30.26006 | -97.73859 | 1 |
| 30.26036009 | -97.73848483 | 1 |
This appears to be pulled from the user’s profile although it may also be a ‘guestimate’ of current location.
Top locations for tweets:
| location | nTweets |
|---|---|
| NA | 2603 |
| San Francisco, CA | 1282 |
| San Francisco | 501 |
| Austin, TX | 329 |
| Seattle, WA | 227 |
| Silicon Valley, CA | 198 |
| Paris | 171 |
| Islamabad, Pakistan | 146 |
| London | 126 |
| New York, NY | 123 |
| Charlotte, NC | 120 |
| San Jose, CA | 114 |
| west tokyo | 104 |
| USA | 104 |
| Boulder, CO | 103 |
Top locations for tweeters:
| location | nTweeters |
|---|---|
| NA | 1036 |
| San Francisco, CA | 172 |
| Austin, TX | 85 |
| San Francisco | 61 |
| Seattle, WA | 48 |
| New York, NY | 40 |
| San Jose, CA | 40 |
| Paris | 38 |
| London, England | 34 |
| Palo Alto, CA | 29 |
| Paris, France | 29 |
| New York | 27 |
| Boston, MA | 26 |
| London | 25 |
| Washington, DC | 24 |
Next we’ll try by screen name.
Top tweeters:
| screenName | nTweets |
|---|---|
| DockerCon | 336 |
| theCUBE | 161 |
| climbingkujira | 128 |
| BettyJunod | 124 |
| jpetazzo | 124 |
| solomonstre | 107 |
| jeanepaul | 104 |
| ManoMarks | 92 |
| OpenShiftNinja | 88 |
| sitspak | 85 |
| vmblog | 79 |
| kaslinfields | 79 |
| SFoskett | 79 |
| jameskobielus | 75 |
| bsmith626 | 73 |
And here’s a really bad visualisation of all of them!
N tweets per 5 minutes by screen name
So let’s re-do that for the top 50 tweeters.
N tweets per 5 minutes by screen name (top 50, most prolific tweeters at bottom)
Analysis completed in: 37.65 seconds using knitr in RStudio with R version 3.3.3 (2017-03-06) running on x86_64-apple-darwin13.4.0.
A special mention must go to twitteR (Gentry, n.d.) for the twitter API interaction functions and lubridate (Grolemund and Wickham 2011) which allows timezone manipulation without tears.
Other R packages used:
Dowle, M, A Srinivasan, T Short, S Lianoglou with contributions from R Saporta, and E Antonyan. 2015. Data.table: Extension of Data.frame. https://CRAN.R-project.org/package=data.table.
Gentry, Jeff. n.d. TwitteR: R Based Twitter Client. http://lists.hexdump.org/listinfo.cgi/twitter-users-hexdump.org.
Grolemund, Garrett, and Hadley Wickham. 2011. “Dates and Times Made Easy with lubridate.” Journal of Statistical Software 40 (3): 1–25. http://www.jstatsoft.org/v40/i03/.
R Core Team. 2016. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Sievert, Carson, Chris Parmer, Toby Hocking, Scott Chamberlain, Karthik Ram, Marianne Corvellec, and Pedro Despouy. 2016. Plotly: Create Interactive Web Graphics via ’Plotly.js’. https://CRAN.R-project.org/package=plotly.
Wickham, Hadley. 2009. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. http://ggplot2.org.
Wickham, Hadley, Jim Hester, and Romain Francois. 2016. Readr: Read Tabular Data. https://CRAN.R-project.org/package=readr.
Xie, Yihui. 2016. Knitr: A General-Purpose Package for Dynamic Report Generation in R. https://CRAN.R-project.org/package=knitr.